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CONSERVING  CONFLUENCE  CURBS  ILL-CONDITION 


W.  Kahan* 

Abstract.  Certain  problems  are  ill-conditioned,  in  the  sense  that 

their  solutions  are  hypersensitive  to  small  changes  in  data,  only 

because  a  slight  change  in  data  could  cause  thoce  solutions  to 

exhibit  singular  behaviour  associated  with  various  kinds  of 

confluence.  For  example,  an  over-  or  under-determined  linear 

system  solved  by  least-squares  can  be  iil-conditionec.  only  if 

there  exist  some  small  perturbations  to  its  matrix  which  increase 

its  nullity  (i.e.  diminish  its  rank);  zeros  of  a  polynomial  can 

be  ill-conditioned  only  if  their  multiplicities  can  be  inci eased 

by  very  small  perturbations  of  the  polynomial's  coefficients; 

eigenvalues  of  a  non-Hermitian  matrix  can  be  ill-conditioned  only 

if  their  algebraic  multiplicities  can  be  increased  by  very  small 

perturbations  of  the  matrix.  When  perturbations  constrained  to 

a  small  neighbourhood  can  be  further  constrained  to  maximize 

confluence,  i.e.  to  maximize  nullity  (minimize  rank)  or  maximize 

multiplicity,  and  when  that  maximized  confluence  can  be  increased 

again  only  by  perturbations  far  beyond  tie  small  neighbourhood, 

then  the  slightly  perturbed  problems  exhibit  v?e.ll-conditioned 

confluent  solutions.  Beyond  these  vague  statements  lie  the 

shadows  of  numerical  methods  which  may  either  eliminate  ill- 

condition  or,  when  ill-condition  is  persistent,  illuminate  its 

cause. 
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CONSFRVTNG  CONFLUENCE  CURBS  ILL-CONDITION 


W.  Kahan 


"Mother  may  I  go  to  swim?" 

"Yes,  my  darling  daughter ; 

Hang  your  clothes  on  yon  tree  limb, 
But  don't  go  near  the  water." 

Introduction .  Numerical  calculations  generally  appear  in  the  form 

Compute  y  ^  f(x ) 

where  f  characterizes  a  class  of  problems  and  x  represents 
the  particular  data.  Commonly  f  is  defined  implicitly  by  a 
set  of  equations  whose  coefficients'  values  constitute  x,  and 
y  is  the  solution  of  those  equations*.  The  equations  are  called 
ili-conditicnea  whenever  there  exist  tiny  perturbations  6x 
which  cause  huge  changes  &y  =  f(x  +  6x)  -  f(x) .  To  make  this 
notion  more  precise  we  imagine  x  and  y  to  reside  in  metric 
spaces  —  normed  linear  spaces  are  customary  —  and  define  a 
condition  number 


Y  =  sup  i!6i/!!/i6x!| 

w.iere  the  suoremum  is  taken  over  all  6x  in  some  neighbourhood 
of  x.  Thus,  the  condition  number  y  is  a  Lipschitz  constant; 

'  f(x+&xj  -  f(xjt;  <_  yl&rlt.  The  larger  is  y,  the  more  ill-condi- 
t.'crnsa  is  the  problem  /  near  x.  When  y  is  infinite  we 
sometimes  say  that  f  is  ill-poeed  near  x,  though  this  term 
is  reserved  by  some  for  discontinuous  behaviour. 


1 


N'on~dif f erentiable  functions  f  are  so  rarely  encountered 


in  practice  that  we  might  as  well  exploit  the  simplification 
afforded  by  constraining  perturbations  dx  to  infinitesimal 
neighbourhood*  .  Now 

f(x+6x;  -  f(x)  »  (?/7’Jx)5x 

wherever  the  vrechet  derivative  3/73x  exists,  in  which  case 

v  *■  !i  9/73x1!  : 


here  ve  use  the  induced  norm  for  1  .near  operators  between  two 
normed  linear  spaces. 

Since  3//3x  is  usually  differentiable  too,  it  seems  natural 
to  guess  that  an  il  1-conditioned  Droblem,  with  !l 3//3xll  huge, 
probably  has  its  data  x  rear  a  place  where  3 f/?.r  becomes 
infinite  or  fails  to  exist.  The  locus  of  all  such  places  is 
usually  a  manifold  in  x's  space,  and  that  manifold  is  the  sub¬ 
ject  of  this  paper.  Here  are  three  examples: 

When  f  represents  solving  a  system  of  linear  equations 
Ay  ■  b  with  square  matrix  A,  so  each  point  x  in  data-space 
has  coordinates  (A,b) ,  and  when  the  infinitesimal  neighbourhoods 
are  generated  by  all  infinitesimal  (6A,6b)  without  constraint, 
then  the  manifold  where  3f/3x  becomes  infinite  consists  of 
Just  those  points  x  ~  (Ayb)  with  singular  A  since  elsewhere 
y  m  A  varies  by  6y  ■  A  ^6£>  -  A  ^b ,  a  bounded  linear 

function  of  the  infinitesimal  perturbation  dr  ~  (8At6b).  When 
f  represents  solving  polynomial  equations 


n- 1 


n-2. 


,  y  -  x 
1-iJ  K 


3 


so  each  ooinl  x  ~  fi.  ,x_ , . . .  ,x  )  in  data-space  is  identified 

12  n 

with  a  polynomial  x(y)  m  yn  -  ^  x  _.yK  ^  ,  the  manifold  where 

df/'dx  becomes  infinite  consists  of  just  those  polynomials  x. 

with  so**?  multiple  zeros  since  elsewhere  each  simple  zero  y  of 

x  varies  by  %  *  k  yn  ^ 6x  ./x' (y) .  A  similar  situation  arises 

“1  a 

when  reoresents  solving  eigenproblens  for  square  matrices 
X\  the  eigenvalues  and  eigenvectors  arc  well-known  to  be 
differentiable  functions  of  X's  elements  only  when  X’s  eigen¬ 
values  are  distinct,  so  the  manifold  of  interest  consists  of 
those  matrices  X  with  some  multiple  eigenvalues. 

One  might  be  tempted  to  assign  some  pejorative  adjective  to 
that  manifold  on  which  df/dx  fails  to  be  finite.  (There  are 
precedents;  in  1884  Sylvester  assigned  the  word  derogatory  to 
certain  matrices  with  multiple  eigenvalues,  and  physicists 
almost  universally  apply  the  epithet  degenerate  to  eigenvalues 
whose  only  flaw  is  their  indistinguishability . )  In  so  far  as 

f  is  ill-behaved  near  that  manifold,  the  more  so  as  it  is 

* 

approached,  the  manifold  warrants  the  name  pejoratvve  .  But 
in  the  last  two  examples  above  f  will  be  found  to  behave  very 
well  cn  the  manifold,  except  as  x  approaches  certain  sub-mar.:'.- 
folJs.  More  precisely,  for  almost  all  x  on  the  pejorative 
manifold  and  for  all  infinitesimally  nearby  x+6x  also  on  that 
manifold  the  difference  f(x+6x)-f(x)  is  a  bounded  linear  func¬ 
tion  of  6x,  and  the  bound  varies  with  x  on  the  manifold  in 
such  a  way  that  the  bound  can  approach  infinity  only  as  x 
approaches  some  doubly  pejorative  sub-manifold  on  which  the  same 
kind  of  behaviour  recurs.  That  phenomenon  is  what  this  paper  is 

about 

a 

Pejorative:  from  the  Latin  pe^orare  to  make  worse. 


The  paradox,  that  /  can  be  well-behaved  on  a  manifold  in 
every  open  neighbourhood  of  which  f  is  arbitrarily  ill-behaved,, 
would  be  uninteresting  but  for  another  property  of  such  pejorative 
manifolds;  they  can  be  characterized  ostensibly  independently  of 
f' s  good  or  ill  behaviour.  For  want  of  a  better  term  I  use  the 
word,  confluence  to  describe  what  happens  to  f  on  those  manifolds. 
When  f  represents  zeros  of  polynomials  or  eigenvalues  of  matrices 
the  confluence  is  obvious;  some  zeros  flow  together  as  a  polynomial 
x  approaches  a  pejorative  manifold;  some  eigenvalues  flow 
together  as  a  matrix  X  approaches  a  pejorative  manifold. 
Confluence  in  a  linear  system  is  identified  with  collapse  of  the 
range  of  its  matrix  as  it  approaches  a  pejorative  manifold;  this 
manifold  in  matrix-space  is  the  locus  of  discontinuities  (drops) 
in  the  rank  function. 

Pejorative  manifolds  are  inte-csting  just  because  they  are 

associated  simultaneously  with  confluence  and  with  an  abrupt 

change  from  wild  mis-behe  ur  to  tame  good-behaviour.  Consider, 

for  example,  a  polynomial  XQ  so  constructed  as  to  ensure,  in  the 

absence  of  error,  that  among  its  zeros  u  *  f(x  )  must  be  some 

o  o 

that  are  coincident;  but  because  error  Ar  has  crept  into  the 

data  x  none  of  the  available  zeros  y  +  hu  *  f(x  -fto)  are  coin- 
c  o  o 

cident.  They  may  well  be  nowhere  neat  coincident.  Frantic 
dispersal  of  perturbed  zeros  is  frequently  quite  pronounced 
when  is  of  high  degree,  and  is  not  surprising  when  we  realize 

how  wild3y  f  must  misbehave  near  a  pejorative  manifold.  Given 
only  x^+Ax  and  a  hound  for  PAxll,  can  we  discover  a  nearby  x, 
on  a  pejorative  manifold?  That  x^  will  not  be  unique  but, 


provided  the  bound  on  ||Ax!l  is  small  enough  to  keep  x^  well  away 
from  a  doubly  pejorative  sub-manifold,  we  can  expect  that  the  multiple 
zeros  among  y^  *  f(x.)  will  not  vary  much  as  x^  runs  through 
those  values  on  the  pejorative  manifold  close  to  x^+Ax.  Thus  do  we 
substitute  a  well-conditioned  problem  f(x^)  for  an  ostensibly  ill- 
conditioned  problem  f(x^-hx).  On  the  other  hand,  we  may  discover 
that  x  +Ax  is  farther  from  the  pejorative  manifold  than  the  bound 
on  t!Ax!l,  in  which  case  we  infer  that  something,  either  the  bound 
or  the  construction  of  xq,  is  wrong  (i.e.  mistaken). 

The  properties  of  pejorative  manifolds  have  many  other  prac¬ 
tical  implications  but  to  discuss  them  here  would  be  premature. 

First  we  must  verify  the  foregoing  assertions  about  those  properties. 
Secondly,  we  should  consider  how  to  locate  the  manifolds  compu¬ 
tationally;  here  is  where  the  theory  is  weak.  Only  for  linear 
systems  do  we  know  how  to  tell  cheaply  whether  a  data-point  x 
is  close  to  or  far  from  a  pejorative  manifold,  and  whether  there 
are  multiply  pejorative  sub-manifolds  nearby,  and  where  they  are. 

Some  of  this  knowledge  is  imparted  in  part  I  of  the  paper. 

Parts  II  and  III  consider  polynomials'  zeros  and  matrix 
eigenproblems  respectively.  For  these  problems  the  simplest 
pejorative  manifolds,  corresponding  to  double  zeros  and  double 
eigenvalues,  are  easy  enough  to  locate;  but  multiply  pejorative 
sub-manifolds  are  not  yet  within  reach  of  cheap  computation.  In 
particular,  we  cannot  easily  tell  whether  a  data  point  x  is  far 
enough  from  a  multiply  pejorative  sub-manifold  that  that  sub-mani¬ 
fold  need  not  be  explored,  unless  x  is  very  far  from  every  such 
sub-manifold.  Fortunately  for  our  theory,  multiply  pejorative 


6 


sub-manif olds  need  only  rarely  be  considered;  in  ordinary  language 
this  means  that  double  toots,  though  rare,  are  overwhelmingly 
more  common  in  practice  than  are  roots  of  higher  multiplicity. 
Consequently,  the  theory  is  ripe  for  exploitation  despite  its 
immaturity.  The  theory's  subsequent  growth  seems  likely  to  depend 
upon  numerical  analysts'  proficiency  with  algebraic  geometry  and 
metric  spaces. 

***** 

I  take  pleasure  in  acknowledging  here  the  assistance  and 

encouragement  received,  while  the  foregoing  notions  were  evolving, 

from  several  years’  discussions  with  many  colleagues  and  friends. 

Especially,  George  Forsythe's  continuing  interest  in  those  notions 

considerably  stimulated  their  development.  I  am  indebted  too  to 

til 

the  ci-ganizers  of  the  5 • —  Gatlinburg  Symposium  on  Numerical  Linear 
Algebra,  held  at  Los  Alamos  on  June  5-10,  1972,  for  an  opportunity 
to  present  those  notions  to  a  wide  audience. 


Part  I:  The  Pseudo-Inverse 

+ 

The  pseudo-inverse  X  of  an  r.  x  r  matrix  X  is  uniquely 
defined  formally  by  the  familiar  equations 


(t)  XX* X  -  X  ,  X*XX*  -  X  ’  ,  {>X  X)  *  -  X*  X  ,  {XX*)* 


bvt  a  better  definition  is  derived  from  its  principal  application, 
the  solution  of  linear  least-sauares  problems:  Given  X  and  an 
m-vector  v  we  seek  th.v  n-vector  w  which  minimizes  Bv  -  Xwll , 
and  when  the  minimizing  w  is  not  unique  ^as  must  be  the  case 
just  when  X's  columns  are  linearly  dependent)  we  seek  that  mini¬ 
mizing  w  with  minimal  fiwlt .  The  vector  norm  used  here  is 
5w8  5  Jw*w\  we  3hall  also  use  the  induced  matrix  norm 

l!Z!l  r.  max  BZw!!/!!w!l  and  the  root-sum-squares  norm  liZlL  =  /tr ,(Z*2). 

+ 

The  desired  minimizing  vector  w  turns  out  to  be  w  -  Xv; 

see  R.  Penrose  (1954,1955).  This  formula  is  interesting  only  when 

X's  columns  are  linearly  dependent  or  nearly  so,  since  otherwise 

f  _i 

we  could  substitute  X  *  ( X*X )  X *  and  ignore  the  equations 
(')  above.  But  just  when  X  becomes  interesting  it  becomes 
numerically  exasperating  no  matter  what  method  is  employed  to 

.U 

compute  it  because  when  X's  columns  are  linearly  dependent  X 
must  be  a  violently  discontinuous  function  of  X  and  hence  hyper¬ 
sensitive  to  small  variations,  as  we  shall  see. 

In  what  follows  we  shall  discern  a  nested  sequence 


M  DM,  5  M,  5  ••• 

c  I  Z 


of  pejorative  (for  k  1)  manifolds  and  sub-manifolds  in  the 
apace  of  m x n  matrices  X;  is  the  manifold  of  matrices 


whose  rank  does  net  exceed  min(m,rz)-  k.  We  shall  discover  that 
^  is  a  well-behaved  function  of  A  provided  X  is  confined  to 
and  avoids  W  More  precisely,  we  shall  find  that  while  X 
and  its  infinitesimally  neighbouring  A+6A  are  constrained  to  M  «M. 

K. " 

•j* 

l 1  *  l/'the  minimum  distance  H  •  •  •  II  from  X  to  M,  )  , 

K .*  X 

t  2  t 

IIA  ||  “  sup P 6 (A  )  0  S  6Xl!  2  over  A+<$A  on  the  same  as  X  . 

Some  of  these  discoveries  have  been  seen  before,  particularly  in 
the  works  of  G.W.  Stewart  (1969),  V.  Pereyra  (1969),  and  Golub 
and  Pereyra  (1972),  whose  treatments  should  be  compared  with  what 
follows.  Finally,  we  shall  consider,  given  X  and  a  tolerance 
i  >  0  9uch  that  all  A+AA  with  lAAP  <  £  must  be  regarded  as 
indistinguishable  for  practical  purposes,  how  to  find  an  approximation 

A  A  + 

X  indistinguishable  from  X  with  the  best-behaved  X  . 

Some  apparatus  is  needed.  Let  us  assume  m  >  n  (otherwise 
transpose  X)  and  denote  A's  n  singular  values  in  order  by 
—  ^2  —  *  —  ^at  *  I! A' !!  is  well  known,  as  is 

4. 

the  fact  that  X  ’s  singular  values  are  the  re-ordered  numbers 

ft,  where 
0 

4.  4. 

C '  =  l/£  except  for  0=0 

Not  so  well  known  is  the  following  relation  proved  by  L.  Mirsky 
(1960,  theorem  2): 

-  min  PAAP  over  rankfA+AA)  <  k 


innlll1i*iM*ini^ 


One  implication  of  this  relation,  to  be  used  later,  is  that  no 
singular  value  of  A+AA  can  differ  from  the  correspondingly 


A 


numbered  singular  value  of  X  by  more  than  AMI.  Another  ixnpl'.- 

r  *f* 

cation  obtained  via  liAf  !!  -  max.(C‘.)  is  that 

C  V 

*Y'!!  ■  1/minAAA'il  over  rankf-Y+M/  <  rankfAj 


Consequently,  if  A:  e  M,  but  X  $  M  then 


Uf' I  »  l/minA/LYl!  over  A4-M  e  M, 


which  is  just  what  was  claimed  for  HAT  !!  above. 

Next  we  shall  exploit  a  little  known  formula; 

/  -/  -  -Y+(X-Y)Xf  +  (1-Y+Y)(X-Y)  W  +  YfY+' * (X-Y)* (l-XXf ) 


This  formula  can  be  verified  by  applying  the  equations  (t)  above. 

to  reduce  the  right-hand  ride  to  its  simplest  terms.  Note  that 

(1 ~Y  Y)  and  (i-AW  )  are  orthogonal  projectors  which  annihilate 
+  *  +* 

Y  and  Y  ,  and  X  and  X  respectively.  Consequently  we 


+  f  *  +  + 

(X  -Y  )  (X  -Y  ) 


+*  *  +*  +  + 

X  (X-Y)  Y  Y  (X-Y)X 

-  X  (X-Y)  Y  YT  (X-Y)  (l-xxf ) 

+  Xf*Xf(X-Y) (l-YfY) (A :-Y)*X+*Xr 

+  +  4**  4  + 

-  (l-XX  ) (X-Y)Y  y  Y  (X-Y)X 

+  (1-/YA+)  (X-Y)  (Y^Y+*)/L(X-Y)* (1-XX+) 


and  taking  norms  yields  an  important  inequality 


nx'-r  i*  <  u-yi2(llAr+A4  +  li/!i2!iy+ii2  +  2iu"'!iiir  iJ+  lyV) 
<  5S^-y|!2  max{llr  ii  ,iir  A}4  . 


l!  v  *  N  i  If  II  ^  ^ 


Now  let  y  ■  X  +  hXf  and  suppose  both  X  and  X  M  LX  lie  on  M, 


■m  f  .«>**&*  i)ln»  in 


10 


but  not  on  As  LX  -*■  0  we  see  that  I!  (X+LX)  fi  becomes 

+  + 

and  remains  bounded,  and  then  that  PY  -  (X+LX)  I!  ■*  0.  In  short, 
X^  is  a  continuous  function  of  X  on  away  from  It 

soon  follows  that  is  differentiable  too,  for  we  need  only 

set  Y  *  A+<5X,  with  infinitesimal  6X  constrained  to  keep 
X  +  6X  ,  like  X ,  on  away  from  ,  to  deduce  that 

6(*+)  *  -Y+(6A')X+  +  (1-A+Y)  (6A*)A+ V  +  xY*  (6**)  (1-AA+) 


Next  we  seek  to  compute  sup  I!  6  (A  )  Sj/BSAI!- -  To  this  end  it 
is  convenient  to  invoke  Autonne's  theorem  which  exhibits  X  ■  PAQ 
where  P  is  n?*m  unitary,  Q  is  n*n  unitary,  and  A  is 

m*n  diagonal  with  the  singular  values  £.  on  its  main  diagonal. 

3 

This  singular  value  decomposition  may  be  computed  at  modest  cost 
by  methods  described  in  Golub  and  Reinsch  (1970) ,  and  will  be 
further  exploited  below.  For  the  present  let  us  partition 


fA  0\ 

o 


in  such  a  way  that  just  X's  non-zero  singular  values  appear 

3 

on  the  diagonal  of  '^he  square  diagonal  matrix  A^.  Evidently 

+  *  * 

X  **  Q  A  P  where 


.t  _  1C  °\* 


t  “1 

Also  HY  II  ■  SAQ  I.  Next  partition  conformally 


64  63 


6A  =  P* (&X)Q*  - 

\6C  oD 

by  fixing  P  and  Q  independently  of  6X  we  oblige  6 A  to  be 
non-diagonal  in  general.  Since  X+6X  must  have  the  same  rank 
aa  X,  A+oA  must  have  the  same  rank  as  A,  and  thia  must  be 
the  same  as  the  rank  of 


I 


^-6C(Ao+64)"1  1 j 


°| 


(A+6A)  - 


A  +64 
o 


63 


.-1, 


\  0  63-6C(A  +64)  63 

\  o 


The  rank  in  question  is  that  of  A  ,  and  also  of  A  +64  since 

o  o 

64  is  infinitesimal.  Therefore  we  must  have 


SD  7  (A o+64)_16B  -  0  , 

but  thia  merely  says  that  the  inf initeaimal  6D  »  0.  Therefore, 
infinitesimal  perturbations  6X  for  which  a  and  I+6Z  have 
the  same  rank  must  have  the  form 


6X 


Substitution  into  the  formula  above  for  S(X')  soon  leads  to  the 
conclusion  that 


J6(/)^  <  UrVlfiXl* 

with  equality  poasl'le  when  64,  63  and  6(7  are  chosen  to  have 

non-zero  entries  only  in  rows  and  columns  corresponding  to  the 
largest  entries  in  A  \  Thus  we  conclude  that  pseudo-inversion 


*1"  MT" 


can  be  ill-conditioned  with  respect  to  rank-preserving  perturba¬ 
tions  only  if  the  d&ta-matrix  X  is  very  near  another  of  lower 
rank. 

Finally  let  us  discuss  how  to  compute  a  pseudo-inverse  appro¬ 
priate  for  a  given  matrix  X  when  given  also  a  to  lerance  >  0 
such  that  all  X+AX  with  BMi  <_  £  must  be  regarded  as  indis¬ 
tinguishable  from  X.  Should  some  of  these  matrices  X+AX  have 
different  rank  than  X  there  must  exist  others  whose  pseudo¬ 
inverses  differ  arbitrarily  much  among  each  other.  None  of  those 
wildly  divergent  pseudo-inverses  can  be  useful.  Instead  let  us 
find  a  matrix  X  *  X+AX  of  minimal  rank  with  BMlI  £.  Such 
a  matrix  is  easily  obtained  from  A  above  by  annihilating  all 
t,  ■  <  5;  let  A  denote  what  results  and  let  X  -  FAQ.  If  f,  <  ", 
then  A  *  A  and  X  ■  X  \  In  this  case  for  all  AX  with 
SMB  ^5  we  find  that 

B(X+M)+II  <  l/(Cn-5)  and 

!!U+M)+-r  1/l/s  <  (l+^/(Cn-02)1/2C/Cn  . 

The  latter  inequality  is  obtained  by  substituting  Y  *  X+AX  and 

•f  +  t  i* 

X  X  «  Y  Y  *  1  into  the  formula  above  for  X  -Y  ,  and  then  taking 

the  norm  of  )  (X^-Y ' )  with  the  aid  of  "X’ll  «  1/5  and 

n 

4* 

II  <_  l/(5n~C)«  The  point  of  the  inequality  is  that  if  £/Sn  <<:  1 
we  may  confidently  assert  that  all  indistinguishable  matrices 
X+AX  have  nearly  the  same  pseudo-inverse. 

The  interesting  case  occurs  when  ^n+i  k  ^or  ®OIne 

k  >  0.  This  means  that  among  the  matrices  X+AXl  with  II Ml!  5 
are  some  of  rank  n-k,n+l-k , . . . ,n.  Every  time  X+AX  changes  rank, 


(X+AY)  jumps  infinitely  violently.  But  as  X+AX  runs  through 
matrices  on  M,  of  rank  n-k  with  !!AYl!  <  5,  (.X+AX)  :  varies 

r\ 

continuously  and 

!U+wr)+-Fl  <  fitt+Kn+lj_k)/ten_k<)2  . 

Whenever  Z/Kn_k  <c  1»  the  pseudo-inverses  of  matrices  on 
indistinguishable  from  X  will  differ  only  slightly  among  each 
other,  although  matrices  X+AX  not  on  M.  will  have  huge  and 

K. 

*4* 

wildly  varying  pseud  a- inverses ;  in  this  case  X'  seems  to  he  a 
reasonable  response  to  the  coimand 

"Compute  F" 

But  if  *n_k  is  only  moderately  larger  than  5  th*;.  command 

deserves  to  be  questioned. 

>4* 

Another  way  to  appreciate  X  when  £/£  .  «  1  <  £/£  j  , 
is  geometrical.  Consider  the  image  P  under  the  operation  t  of 
the  ball  8  of  matrices  A+AY  with  f!AAi  <5;  i.e.  consider  the 
set  P  of  pseudo-inverses  of  all  matrices  in  that  ball  8.  P  has 
two  disconnected  components  P  and  P  .  P  consists  of  the 

O  05  o 

pseudo-inverses  of  matrices  in  8  O  M, ,  and  looks  like  a  small 

K. 

^  4  4* 

bent  coin  roughly  centered  on  X  ;  all  the  points  (X+AX)  in  P 

o 

'•t 

are  close  to  X  (see  the  inequality  above)  and  have  modest  norms 
not  exceeding  1/ (5n_^~5) •  The  other  component  Pv  has  tentacles 
which  reach  to  »  starting  from  far-out  points  (X+AX)  which 
must  satisfy  l(X+M)+»  >  l/(5+5n_fe+1)  »  1/i^-Q. 


Part  II:  Zeros  of  Polynomials 


Many  numerical  analysts  suffer  from  a  misconception  that 
multiple  roots  are  infinitely  more  ill-conditioned  than  simple 
roots.  Actually,  a  multiple  root  behaves  much  better  than  the 
clustered  simple  root-approximations  so  often  accepted  in  ita 
place.  More  precisely,  we  shall  find  that  each  zero  of  a  poly¬ 
nomial  is  a  differentiable  function  of  its  coefficients  provided 
that  zero's  multiplicity  is  conserved;  only  when  multiplicities 
change  can  the  derivatives  become  infinite.  Moreover  we  shall 
find  that  the  condition  number  of  a  multiple  zero  must  be  inversely 
proportional  to  the  product  of  the  distances  from  that  multiple 
zero  to  all  other  zeros  of  the  polynomial.  For  the  problem  of 
finding  polynomials'  zeros  the  pejorative  manifolds  and  sub-mani¬ 
folds  ti  the  space  of  polynomials  are  evidently  the  loci  occupied 
by  polynomials  with  various  combinations  of  multiple  zeros  (one 
double  zero,  two  double  zeros,  ...»  one  triple  zero,  one  triple 
and  one  double  zero,  However,  given  a  polynomial  x  no 

convenient  way  is  known  yet  for  determining  how  near  x  is  to  ° 
pejorative  manifold  short  of  computing  laboriously  all  the  points 
nearest  x  on  each  of  the  various  manifolds  and  sub-manifolds. 

We  shall  describe  some  of  the  e.asie-  3uch  calculations. 

I I . 1 :  Differentiability  of  Multiple  Zeros 

If  S  is  a  simple  zero  of  the  monic  polynomial 

/  ,  -  n  vn  n~ j 

X(T)  -  T  -  ^  X.T 

then  x'e  first  derivative  x'(~)  cannot  vanish  at  r,  and  hence 
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each  3;/3x.  -  /x'  (t)  must  be  finite,  whence  it  follows  that 

d 

£  must  be  an  analytic  function  of  each  coefficient  x.  as  long  as 

£  remains  simple.  To  what  extent  can  this  assertion  be  valid 

when  5  is  a  multiple  zero  of  x? 

Whenever  x  has  a  multiple  zero  its  coefficients  x  •  must 

J 

satisfy  certain  constraints  expressible  as  polynomial  equations 
in  those  coefficients  with  the  aid  of  determinants  known  as 
bigradients  or  resultants ;  see  Bocher  (19C7,  ch.  XV)  or  Householder 
(1970,  §§1.2-3)  or  van  der  Waerden  (1950,  ch.  XI).  It  suffices 
to  acknowledge  th;  e  constraints  without  describing  them,  and 
then  exploit  them  with  the  following  result: 


Proposition  II.  1:  The  constraints  satisfied  by  the  coef f icifcols 
x  .  of  the  monic  polynomial 


,  ,  _  n  p«  n-j 
x(t)  =  x  -  l,  x„.T 


when  it  possesses  an  v. -tuple  jerc  £  define  £  and  the 

last  m-1  coefficients  x  . „  , . . . ,x  to  be  analytic  func- 

n+2-m‘  n  1 

tions  of  each  of  the  first  n+l-m  coefficients 

xl’z2’' '  *  ,Xn+1-ff!  as  •'•on®  as  t*ie  of  C  remains 

precisely  m,  irrespective  of  the  other  zeros'  multiplicities. 

And  then  if  L  ,  ,,£  .......  C  are  x's  other  n-m  zeros. 

m+1  m+2  n 

different  from  £  but  otherwise  not  necessarily  distinct, 


3£/3x . 


rn-i+l-m 

*-5— S3  f0r  • 


Proof :  Since  £  is  an  m-tuple  zero  of  x,  x^  (t,)  4  0  but 
x^m~1}(V  -  0,  x(m“2)(c;  ■  0,  ...,  x'(V  -  0  and  x(V  -  0.  The 


irst  two  relations  imply  that  £,  as  a  simple  2ero  of  x 


i 


(m-1) 

L..ust  be  an  analytic,  function  of  its  coefficients  x,  x 

i  L  tt+j <  -.71 

“ubstituting  that  function  for  £  in  the  last  m-1  equations 
exhibits  the  last  m-1  coefficients  in  turn  as  analytic  functions 
of  the  first  n+l-m.  Then  differentiate  the  equation  x ^  ^  * 

with  respect  to  x_.  to  produce 


(OWix.  -  («-£)! Cn"t'*‘1“'7'/(»-i+l-w)!  -  0 

i 


and  apply  Leibniz's  rule  to  z(x)  *  (T-£) ‘II  . .  (x-£  .)  to  produce 

m+i  j 

x(m)(U  ■  mill”  .(£-£.) ,  whence  follows  the  last  part  of  the 
m+i  J 

proposition. 


Here  are  three  examples  to  illustrate  the  proposition.  Fi.Bt, 

2  2 
a  quadratic  X  -2aT+8  has  a  double  zero  £  ■  a  Just  when  S  *■  a  ; 

here  £  and  8  are  analytic  functions  of  ot  as  claimed  in  the 

propositicr.  out  if  we  regarded  £  and  a  as  functions  of  8 

they  would  have  a  branch-point  singularity  at  6*0.  This  first 

example  provides  some  excuse  for  regarding,  as  does  trie  proposition 

the  first  n+l-m  coefficients  x.  instead  of  some  other  subset 

-7 

V 

as  independent  variables. 

The  second  example  is  a  quartic 

-  4ax3  +  68x2  -  4yx  +  6 

2 

which  has  a  triple  zero  £  -  ci+  A  whenever  f  -  (a+A)  (a-2A) 
and  6  ■  (a+A) J (a-3A)  where  A  =  ±(a2-S)^2;  evidently  £,  y 
and  6  are  analytic  functions  of  a  and  8  except  at  the  branch 
point  where  A  ■  0,  at  which  point  £  becomes  a  quadruple  zero. 


The  third  example  is  the  quartic 


q(x,X)  St4-  (2+A2)t2  +  2A  '  X ix  +  1  -  X2  for  real  A 
=  (t  -  sign(A))2(T  +  signfA)  +  A)  (t  +  sign(A)  -  X) 

~  has  a  double  zero  5  for  all  real  A,  but  £  ■  1  for  A  >  0 

r.d  5  *  -1  for  A  £  0,  with  ambiguity  and  discontinuity  at 

2 

A  »  0  despite  that  £  and  the  last  coefficient  1-A  may  appear 
to  be  formally  analytic  functions  cf  the  first  three  coefficients 
(0,2+A  , 2A ] A | ) .  But  these  first  coefficients  are  not  free  here 
to  vary  independently,  nor  are  they  analytic  functions  of  the  real 
parameter  A  near  A  ■  0.  A  oetter  explanation  for  the  apparent 
anomaly  is  obtained  from  a  geometrical  approach  which  identifier 
quartic.  polynomials  with  points  in  a  4-dimensional  space.  The 
polynomials  with  double  zeros  constitute  a  3-c imensional  manifold 
in  that  space;  the  manifold  intetcects  itself  at  points  corres¬ 
ponding  to  polynomials,  like  q(Tt0),  with  two  double  zeros.  As 
A  runs  from  -1  to  0  to  +1,  say,  q(x,XJ  runs  along  one 
sheet  of  that  manifold  to  a  point  of  self-intersection  and  then 
turns  a  corner  to  run  along  the  other  sheet.  The  3-dimensional 
manifold  is  pejorative;  the  corner  where  q's  double  zero  is 
discontinuous  lies  on  a  multiply  pejorative  sub-manifold.  Little 
seems  to  be  known  about  the  complicated  geometry  of  these  manifolds 

II. 2:  Condit ion  Numbers  for  Multiple  Zeros 

The  condition  of  a  zero  £  of  a  polynomial  x  is  generally 
a  vague  notion  (cf.  Wilkinson  (1963,  pp. 29-32  and  47-48))  partly 
because  the  metric  by  which  we  measure  distance  between  polynomials 
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is  so  often  arbitrary.  A  natural  metric  for  polynomials  regarded 
as  points  in  a  linear  space  is  a  vector  norm  I!  •  •  •  I! ;  e.g.  for 

arbitrary  weights  w  .  >  0 

J 


x  .xn  1,7 1!  =  /f[n  u.jx.j^) 
Lo  j  uo  J 1  j 1  ; 


Although  we  shall  use  just  this  last  norm  in  what  follows,  the 
statements  concerning  condition  numbers  will  be  stated  for  (and 
are  valid  for)  any  vector  norm.  Whatever  the  norm,  one  corresponding 
condition  number  for  a  zero  £  of  a  polynomial  x  will  be 
defined  to  be 


y(C,x,||  •  •  •?)  ~  sup  1 6^  j/ II 6x1! 

6x 

where  65  is  the  infinitesimal  change  in  £  caused  by  changing 
the  polynomial  x  infinitesimally  to  x+6x.  This  condition  number 
Y  is  appropriate  when  absolute  variations  in  £  and  X  are  at 
issue;  y/'cl  is  a  more  appropriate  condition  number  when  relative 
variations  65/^  are  a.  issue. 

Of  course  y!s  definition  makes  sense  only  if  6x  is  under¬ 
stood  to  be  so  constrained  that  £'s  multiplicity  is  conserved; 
otherwise  C  loses  its  identity,  disintegrating  into  a  cluster 
of  zeros  whose  condition  numbers  approach  infinity  as  the  cluster 
coalesces  upon  £.  This  assertion,  which  we  have  yet  to  prove, 
explains  why  multiple  zeros  ha--"  a  bad  reputation  for  ill-condition 
undeservedly  acquired  by  associat  on  with  the  cluster  of  closely 
spaced  and  therefore  ill-conditioned  approximate  zeros  which  are 
so  often  accepted  instead  of  multiple  zeros;  cf.  Wilkinson  (1963, 
p. 41,  §8). 


Proposition  1 1. 2:  If  £  is  an  m-tuple  zero  of  a  monic  polynomial 

x  whose  other  zeros  are  £  .  ,£  , . . .  ,5  then  its  condition 

m+j.  m+i  n 

number  is 


where  K  is  independent  of  x  and  its  zeros  other  than  C. 


Proof  (for  any  norm  8  •••!):  If  x(t)  «  T  ~  l*  "  and 

6x(tJ  *  fix;!”  t7  then  by  proposition  II.  1 

.  1  »-<+!-»  n 

m  '  1  '•m- iJ*  z  m+lv* 

Here  65  is  expressed  as  a  linear  function  of  the  first  n+l-m 

infinitesimal  coefficients  6a;..  The  last  m-1  coefficients 

J 

are  also  linear  functions  of  the  first  n+l-m  obtained  by  solving 
a  triangular  system  of  linear  equations  derived  from  the  equations 


6a: 


(A:) 


(!,)  m  0  for  k  m  0,1,2, ... ,m-l  ; 

(O  +  x(;<+1)rc;6c  -  0  for  k  -  0,1,2 . m-1  . 


The  last  set  reduces  simply  to 

6xf'*'(z)  -  0  for  k  -  0,1,2 . m-2 

which  may  be  solved  for  6x  . _  m,6x  ,  . 6x  in  turn.  Hence 

n+/-m  n+j-m  n 

there  exists  some  linear  operator  Q  depending  upon  t,,  n  and 
m  alone  such  that 

6=  .  . 


This  linear  operator  Q  transforms  an  arbitrary  polynomial  p 
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of  degree  n-m  into  another  a  ■  Qp  of  degree  n-1  in  such  a 


way  that 


q(V  -  q' (O  ■  •••  ■  q^m  ^  (£)  •  0  and  q^m  ^  •  p  ; 


the  last  few  equations  constitute  an  initial  value  problem  whose 


solution  is 


q( t;  -  (<2p)fx;  -  [T(T-6)m“2pfe;de  . 


Hence  we  deduce  that  $p  *  0  only  if  p  m  0,  r.nd  therefore 


!I!pI!! 


is  another  norm  on  the  linear  space  of  polynomials 


p  of  degree  n-m.  Now 


y  -  sup  |6cj/lli5x!i  over  constrained  Sx  -  Q6x^m~'‘^ 

and  -  e  ix'1  '/n^(5-5^)  where  e  is  the  linear  functional 
defined  above  in  the  earlier  expression  for  65.  Hence 


Y  *  sup  je*6x(m_1) i  /  !!|«x(m'1) 


,(m-l) 


X/JT. .  5-5  . 

m+l 1 ^  j  1 


as  claimed!  where 


K  =  sup  \e  P  I  /  Hip  HI  over  (n-m) -degree  polynomials  p 


depends  upon  m,  n,  5  and  the  norm  !!  •  •  •  li  but  not  upon  x 
nor  its  zeros  other  than  5. 


Corollary:  Proposition  Ii.2  may  be  applied  to  non-mcnic  polynomials 
x(l)  -  ^  ^  with  x^  +  0  provided  K  is  replaced  by 
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a  different  function 


The  foregoing  results  fail  to  reflect  one  important  aspect 
of  floating  point  computation  —  independence  of  scaling.  Speci¬ 
fically,  we  would  expect  the  relative  precision  of  approximations 
to  a  zero  £  =  <J£  of  x(x )  =  crnx(x /a)  to  be  independent  of  o 
at  least  as  long  as  the  scale  factor  o  is  a  modest  power  of  the 
computer's  radix.  The  proposition  above  appears  to  give  results 
which  are  altered  by  scaling,  but  it  can  also  be  applied  in  a  way 
independent  of  scaling.  The  proposition  remains  valid  when  the 
norm  II  •  *  ‘  II  varies  with  £,  as  for  example  does 


x  .x 
J 


n-Jn  = 


m: 


w 


l*,-c 

V 


More  generally,  wnenever  the  no.m  II  •  •  * !!  varies  with  £  in  such 
a  way  that  lx(x)l  *  |j|£  x(£X>)  |||  for  some  norm  ||j***|j|  independent 
of  £  we  find  that 


y(oZ,onx(t/o),\\  •  •  •  II  J/  jo£  |  -  y('£,x,ll-**l!  J/j£[ 

-  Krm,r.,i,ii---i1;/n^1|i-£./£! 

Then  the  condition  number  y/jcl  is  independent  of  seeling  and 
depends  only  upon  the  multiplicity  m  of  £  and  its  relative 
separation  from  x's  other  zeros.  Consequently,  only  clusters  of 
relatively  closely  spaced  zeros  can  be  ill-conditioned  when  such 
a  £-dependent  norm  is  used. 

The  word  alueter  used  above  has  been  used  very  loosely.  One 

20 

might  hardly  consider  the  zeros  of  x(x)  1  (x-j)  to  constitute 
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a  cluster  in  the  usual  sense,  yet  the  zeros  near  15  have  been 
observed  by  Wilkinson  (1963,  pp. 41-43)  to  be  ferociously  ill- 
conditioned.  This  observation  does  not  contradict  proposition  II. 2 
and  its  corollaries;  when  the  constant  K  is  evaluated  (for 
m  ■  1  here)  we  do  get  condition  numbers  of  the  order  of  10  \ 

This  means  that  one's  intuition  about  clusters  is  unlikely  to  be 
reliable . 

Calculations  by  a  student,  Mr.  David  Hough,  have  shown  that 

20 

one  need  only  perturb  each  coefficient  of  x(x)  -  (t-,7)  by 

less  than  one  part  ir.  10^  to  construct  a  nearby  polynomial 
x+Ax  whose  zeros,  while  still  all  real,  include  a  double  zero. 
Consequently  the  polynomial  x  is  very  close  to  a  pejorative 
manifold;  in  fact,  it  Is  almost  equally  close  to  several  multiply 
pejorative  sub-manifolds.  These  observations  explain  Wilkinson's 
polynomial's  ill-conaition  more  convincingly  than  can  any  alle¬ 
gation  of  clustering  among  its  zeros. 

II. 3:  Where  are  the  Pejorative  Manifolds? 

When  m  of  a  polynomial's  zeros  are  clustered  closely  in  a 
region  well-separated  from  the  rest  of  the  zeros,  it  is  natural 
to  expect  that  a  small  perturbation  in  the  polynomial's  coeffi¬ 
cients  should  suffice  to  collapse  the  cluster  into  an  m-tuple 

zero.  That  m-tuple  zero  must  be  a  simple  zero  of  the  perturbed 

s  t 

polynomial's  (ro- 1) —  derivative,  and  therefore  close  to  a  zero 

s  t 

of  the  original  polynomial's  (.m-1) —  derivative.  Consequently 
when  we  wish  to  substitute  what  we  hope  is  a  well-behaved  m- tuple 
zero  for  a  cluster  of  n  ill-behaved  zeros,  we  can  approximate 


-  -  - — - -s—i...,..— — — -  ' 
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St 

the  m-tuple  zero  by  a  simple  zero  of  the  polynomial's  (m-1) — 
derivative  provided  such  a  simple  zero  can  be  found  near  the  cluster. 
The  next  resu’n  guarantees  that  such  a  zero  can  be  found. 


Lemma  1 1. 3;  Suppose  the  degree  polynomial  x(x)  has  at 

least  m  zeros  . (1  £  £  n)  in  some  convex 

region  C.  Then  x''  (x)  must  vanish  at  least  once  in  the 

star-shaped  region  S  consisting  of  all  points  from  which  C 
subtends  an  angle  no  less  than  7T/(n+l-m). 


Proof:  Let  Am-1j-fC1,C2 . be  the  (m-1)— divided 

difference  of  x(x)  over  the  zeros  . 5  •  Since  each 

12  m 

x(X,.)  -  0  that  divided  difference  must  vanish.  Therefore  we 

3 

obtain 


r  r 


All  a  .>o 
J- 

and  l  a  .*1 


3 


o  Jdo.do  ••  *da 
3  3  12  m 


A  *  •  *  ,^rrj^  m 


from  a  formula  attributed  to  Hermite  and  to  Genocchi  by  Milne- 


Thomson  (1933,  p.10  and  p.18  ex.  6).  Let  us  denote  the  n+l-m 

(m  1  \ 

zeros  of  x '  hv  n  ,n  . . . h  and  so  infer 

m  m+ 1  n 


r  r 

j  j 


n 


m 


All  o  .>0 

3~ 

and  £  a  .*1 

J 


'  k'm 


(n? 


l  a  x  .)dc  do .•••do  -  0 
•  ,  33  12  m 


J-l 


From  this  point  we  pursue  an  argument  similar  to  Marden's  (1966, 
524). 

Were  every  n^,  outside  5  we  could  find  a  8^  for  each 


k  *  m,m+l . 


n  such  that 


i j .,  wwiiwii*!; .wawwwj^i^iw. »««  it* *-*» ■* •wvti ■  ./■■>■  *>y- ■  w*  ?.*  i*v  u.i.r « '**■  vwrw  v*- ^ 

*9SW5>sjW-#„ 
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In 


0  £  argrn^-Ty)  -  6^  <  ir/(n+l-m)  for  all  x  in  C 
particular  ^  a.r.  lies  amidst  the  C.'s,  and  hence  in  C, 


for  all  relevant  sets  of  values  c^.O., . c^;  therefore  we  should 

deduce  that 


n 


m 


0  <  argr  n  (nk-  I  0,5  .);  -  [  9  <  it  , 

fc-m  j-i  J  J  -  * 


m 


whence  it  would  follow  that  the  last  integral,  with  its  integrand 
confined  to  a  half-plane  that  excludes  zero,  could  not  vanish. 
This  contradiction  proves  the  lemma. 

In  particular,  when  C  is  a  circle  of  radius  p  then  S 
turns  out  to  be  a  concentric  circle  of  radius  p  esc  — — 


in 


n+l-m ’ 

general  S  cannot  be  enormously  larger  than  C,  so  the  desired 

(YD—  ’  N 

simple  zero  of  x  "" '  can  always  be  found  somewhere  near  a 

cluster  of  m  zeros  of  x. 

In  general  one  cannot  expect  ill-conditioned  zeros  to  cluster 
in  an  obvious  way,  and  we  must  search  instead  for  nearby  polyno¬ 
mials  on  pejorative  manifolds.  Thus  one  comes  to  consider  problems 
like  this  one: 


Problem  IK3:  Giver,  rW  tied  the  nearest 

^  t 

Vd 


-  -K  y n 

"*1 

polynomial  x-y ,  where  y(x)  £  f  y  Tn“J',  with  an  ra-tup.e  zero, 


We  interpret  "nearest"  to  mean  that 


with  given  positive  weights  w.,  should  be  minimized. 

J 

This  problem  can  be  approached  in  a  conventional  way  via 
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Lagrange  multipliers.  The  result  is  a  set  or  m  equations 


(k) 


m-2  n-k 

=  l  K-  l 


((» zA mi 


i= 0  1J^1  (*-«?-*>  !<«-  '■-*>! 


ig!Lg!!  ,w,t. 


w  . 
3 


for  =*  0,1,2, .. .  ,m-l 


from  which  we  eliminate  the  Lagrange  multipliers  X.  by  setting 

a  determinant  of  the  coefficients  of  (1,A  ,A, ,...,A  .)  to  zero. 

o  1  m-2 

The  result  is  an  equation  to  be  solved  for  the  m-tuple  zero  £. 

The  equation  is  not  a  polynomial  equation  because  both  £  and  its 
.  * 

complex  conjugate  £  appear.  When  m  ■  2  the  equation  is 


V'(V  -  *< V  {JJ_1(n-j) !  tn~j  1 2/u  J  /  (JJ |  c’1"'1 1 2/u,.) 


and  is  not  hard  to  solve  for  £,  though  most  of  the  solutions 
must  be  discarded  as  irrelevant. 

The  problem  becomes  more  interesting  when  x(t)  has  real 
coefficients  and,  naturally,  we  require  that  y( t)  have  real 
coefficients  too. 

However  ugly  these  calculations  may  be,  they  are  worth  pur¬ 
suing  whenever  x  has  a  badly  ill-conditioned  m-tuple  zero  r. 
For  if  <;'s  condition  number  y  is  huge  then,  since  proposition 
11.2  tells  us  that 

Y  -  K/  n  U-C  |  =  m! K/jx(m)a)\ 
m+1  J 

for  some  modest  K,  we  see  that  x  differs  from  a  polynomial 

( m )  .  . 

x(t)  -  y\j)  =  x(x)  -  - — 

m! 


with  an  (m+l)-tuple  zero  £  by  just  a  little; 


26 


M  -  KR  (T-?>mn/Y 


x  can  be  no  farther  than  that  from  the  multiply  pejorative  aub- 
mauifold  of  polynomials  with  (mfl)-tuple  zeros. 
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part  III:  Eigenproblems 

"What  I  tell  you  three  times  is  true." 

Lewis  Carroll,  Hunting  of  the  Snark,  Fit  1. 

Let  £  be  an  m-c uple  eigenvalue  of  the  n  x  n  matrix  _ 
and  let  62  run  through  infinitesimal  perturbations  so  constrained 
that  2+62  continue0  to  possess  an  m-tuple  eigenvalue  £+ci 
near  X,.  We  define 

y(Q,Z,l  •  •  •  \)  2  sup  1 6 c; ( / S! 6 Z il  over  such  constrained  6Z 

to  be  the  condition  number  of  £  as  an  m-iuple  eigenvalue  of  2 
with  respect  to  some  given  matrix  norm  II  •  *  •  It .  The  constraints 
on  6Z  are  complicated  but  indispensable  when  m  >  1;  without 
them  the  condition  number  Y  would  be  either  infinite  or 
meaningless. 

We  shall  obtain  estimates  for  y  which  relate  it  to  the  norm 
of  the  spectial  projector  F  onto  Z, 's  m-dirnensional  invariant 
subspace.  P  is  characterized  by  the  equations 

P2  =  P  ,  PZ  =  ZP  ,  rank (P)  »=  m  ,  P(2-£)m  =  0  ; 

P  can  be  computed  straightforwardly  from  the  similarity  trans¬ 
formation  that  exhibits  2  ’s  Jordan  normal  form.  (For  example, 

when  m  -  1  ^  's  non-zero  row  and  column  eigenvectors  x  and  y  , 

^  ^  £  £ 
which  satisfy  x  2  -  and  Zy  -  Z,y  ,  yield  P  -  yx  /x  y.) 

We  shall  find  that,  roughly  speaking,  y  is  big  if  and  only  if 

il  Pit  if?  big.  Since  y  is  appreciably  more  expensive  to  compute 

than  llPil  when  m  >  1,  we  shall  use  i! Pi!  as  a  measure  of  £ ’s 
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ill-condition  instead  of  y. 

Hypersensitivity  to  small  perturbations,  and  the  consequent 
risk  of  numerical  instability,  always  accompany  a  spectral  projec¬ 
tor  of  large  norm  irrespective  of  whether  it  belongs  to  a  multiple 
eigenvalue  or  to  a  cluster  of  simple  eigenvalues  of  Z.  The  ..p  — 
tral  projector  P  onto  an  m-dimensional  invariant  subspacr 
belonging  to  a  cluster  of  m  eigenvalues  t,.  (counting  multi- 
plicities)  is  just  the  sum  of  the  spectral  projectors  P,  belong¬ 
ing  to  the  distinct  values  t,..  When  liPlI/m  is  huge  at  least 

one  of  the  llp.ll  's  must  be  huge  too  so  at  least  one  must  be 

J 

ill-conditioned  We  shall  see  other  bad  things  happen;  for  exam¬ 
ple  every  similarity  transformation  Q,  which  reduces  Z  to  a 
diagonal  sum 


in  which  the  rr.xm  matrix  B  has  as  its  spectrum  the  cluster  of 
m  ei8envalues  ,  is  necessarily  ill-conditioned  in  the  sense 
that  XH  must  exceed  DPI,  roughly.  Indeed,  when  ||P||/m 

•'s  huge  the  cluster  i  «ery  identity  as  a  cluster  of  m  eigen¬ 
values  may  be  jeopardized  by  small  uncertainties  or  perturbations 
in  Z.  Why?  Because  then  to  every  closed  contour  F  which 
encloses  the  cluster  and  excludes  the  rest  of  Z  's  eigenvalues 
corresponds  at  least  one  small  perturbation  AZ,  with 
IIAZS  s  kIIZII /||/>!l  ^  for  a  modest  constant  k,  such  that 
has  either  fewer  than  m  or  more  than  n  eigenvalues  inside  [’. 
In  the  special  case  when  Z  's  cluster  contains  just  one  m-tuple 
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eigenvalue  the  small  perturbation  LZ  can  be  so  chosen  that 

that  same  £  is  an  (m+1)- tuple  eigenvalue  of  Z+AZ;  our  proof 
of  this  assertion  will  sharpen  and  generalize  portentous  results 
for  m  «  1  published  earlier  by  Ruhe  (1970)  and  Wilkinson  (19721 . 

So,  spectral  projectors  of  huge  norm  are  critical  symptoms  oi 
hypersensitivi* v  to  small  perturbations,  and  no  matrix  can  possess 
huge  projectors  unless  tiny  perturbations  to  its  elements  suffice 
to  increase  the  multiplicities  of  some  of  its  eigenvalues.  Evi¬ 
dently  the  eigenproblem' s  pejorative  manifolds  and  cub-manifolds 
consist  of  those  matrices  with  various  combinations  of  multiple 
eigenvalues  (one  double,  one  triple,  two  double,  one  quadruple, 
one  double  and  one  triple,  . ..). 

Although,  given  a  matrix  Z,  no  convenient  way  is  known  yet 
to  determine  just  how  near  Z  is  to  arbitrary  pejorative  sub¬ 
manifolds,  ways  are  known  to  find  noints,  close  enough  to  Z  ter 
many  practical  purposes,  on  some  simpler  pejorative  sub-manifclds . 
These  ways  invoke  unitary  similarity  transformations  which  reduce 
Z  to  a  block-upper-triangular  form  with  diagonal  blocks  of  small 
dimensionality.  Each  block  is  intended  to  correspond  to  a  cluster 
of  Z  's  eigenvalues  to  which  belongs  a  spectral  projector  of 
moderate  norm  even  though  the  spectral  projectors  belonging  to 
every  sub-cluster  of  the  cluster  have  huge  norms.  When  such  clus¬ 
ters  exist,  and  often  they  do  exist,  they  may  not  look  like  clus¬ 
ters  to  th-  ”aked  eye;  this  is  so  because  the  individual  eigen¬ 
values  in  the  cluster  are  very  ill-conditioned  and  disperse  fran¬ 
tically  in  response  to  most  small  perturbations  of  Z.  The  eigen¬ 
values  in  a  cluster  can  be  identified  only  by  the  observation  that 
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each  eigenvalue's  projector,  though  huge,  cancels  parts  of  the 
others'  projectors  ir  such  a  way  that  the  sum  of  all  the  individuel 
projectors  has  a  moderate  norm. 

Having  found  suitable  clusters  and  corresponding  small  blocks, 
we  try  to  replace  each  block  by  its  nearest  .like-dimens  Ion  "d 
matrix  with  just  one  eigenvalue;  this  turns  out  to  be  tantamount 
to  the  construction  for  each  block  of  the  nilpotent  matrix  nearest 
to  it.  Enough  is  known  about  that  construction  to  make  it  cheap 
for  small  blocks  —  2x2  and  3x3  —  but  for  larger  blocks  no 
cheap  construction  is  known  yet. 

The  theory  is  extensive  but  incomplete.  Lacking  sharp  indi¬ 
cations  of  the  distance  from  Z  to  various  pejorative  sub-rauifoL.c , 
we  could  too  often  become  enmeshed  in  expensive  calculations  of 
nearest  nilpotent  matrices  whenever  Z  is  neither  so  far  from 
all  pejorative  sub-manifolds  that  they  are  obviously  ignorable 
.ior  so  near  to  some  as  to  indicate  obviously  which  ones  are  the 
only  ones  worth  considering.  Yet  the  theory  is  attractive.  If  It 
can  be  refined  to  cover  the  majority  of  cases  that  arise  often  in 
practice,  it  will  be  complete  enough. 

I I 1.1  Some  apparatus 

Only  the  following  matrix  norms  will  be  used; 

II 2  =  /tr.U  X )  *  /£  (singular  values  of  Jf)‘  , 

IUII  -  max  3JTJH  /|Y|  -  maximum  singular  value  of  X 
Y+  0  1  1 


These  norms  have  been  chos  n  because  they  are  not  changed  when  X 
is  multiplied  bv  a  unitary  matrix  and  consequently  have  many  useful 


properties  which  we  will  invoke  with  little  comment;  for  details 
see  Mirsky  (1960)  or  Gohberg  and  KreXn  (1969). 

Given  an  nxn  matrix  z  we  shall  sometimes  identify  a  clus¬ 
ter  of  m  of  its  eigenvalues  ^  (counting  multiplicities)  by 
specifying  one  of  the  closed  contours  T  in  the  complex  plane 
which  enclose  all  of  the  cluster’s  m  eigenvalues  strictly  xn 

"“ur  interi(  '  -*-eaving  the  rest  of  2 's  spectrum  strictly  outside. 
Some  of  the  contours  may  have  disconnected  components  but  none  of 
them  can  pass  through  an  eigenvalue  of  2.  We  soon  discover, 
after  Kato  (1966,  p.67),  that 

P  =  KT  lrWl*T 

is  the  spectral  projector  onto  Z 's  invariant  subspace  belonging 
to  the  cluster  of  eigenvalues  inside  F .  These  eigenvalues  are  the 
m  non-trivial  eigenvalues  of  PZ  -  ZP,  of  which  the  remaining 
n-m  eigenvalues  are  just  0. 

There  are  other  ways  to  represent  P.  We  may  aptly  select  a 
new  (generally  not  orthogonal)  coordinate  system,  or  equivalently 
,-erform  an  apt  similarity  transformation,  which  Will  exhibit  2 
m  the  reduced  form  (£  °)  in  which  the  m  matrix  B  has  as 
its  eigenvalues  just  those  inside  the  cluster  and  A  ’s  eigenvalues 
are  outside.  In  that  coordinate  system  p  appears  as  . 

Alternatively  we  may  invoke  Schur's  theorem  to  obtain  a  new 
orthogonal  coordinate  syatem,  or  equivalently  perform  a  unitary 
similarity,  which  will  exhibit  Z  in  the  (block-)  upper  triangular 


form 
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fA  AR-RB 
\C  B 

in  which  A  and  B  have  the  same  spectra  as  before.  The  block 
AR-RB  is  written  that  way  for  more  convenient  correlation  with  P 
which,  in  the  same  coordinate  system,  has  the  form 


t 
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Lemma  1 1 1 . 1  ■  1  :  iP  ■*  0  if  and  only  if  tr.(B^)  ■  0  for 

k  -  1,2, ...  ,ri' 

Proof :  Apply  Newton's  identities  (cf.  Householder  (1970)  p.i<) 
to  sums  of  powers  cf  3' s  eigenvalues  to  deduce  that  the.,  *> 
all  vanish. 

What  conditions  upon  an  infinitesimal  perturbation  6 B 
ensure  that  both  B  and  B+6B  are  nilpotent?  Another  way  to 
think  of  this  question  is  to  imagine  that  B  m  B(l)  is  an  analytic 
function  of  X  that  stays  nilpotent  for  all  X;  what  characterizes 
B  =  dfi/dx  for  all  such  functions?  The  question  is  not  trivial 
because,  although  we  may  differentiate  the  equations  if*  »  0  and 
tr.(B  )  -  0  to  get  respectively 

Jj  -  0  or  %  ^  -  0  and 

tr. -  0  or  tr.(Bk~1BJ  -  0  for  k  -  1,2,..., *  , 

those  are  merely  necessary  conditions  upon  6B  and  B  ;  when  B 
is  a  derogatory  nilpotent  matrix  those  conditions  fail  to  be  suffi¬ 
cient.  For  example,  when  B  B  0  those  conditions  impose  almost  no 
constraint  upon  6 d  and  B  whereas  they  ought  to  satisfy 
(69)W  ■  0  and  5^  *  0. 

Lemma  1 1 1 . 1 . 2 :  When  B  is  a  non-derogatory  nilpotent 

matrix  the  following  three  conditions  are  equivalent  and  charac- 

•  • 

terize  the  derivative  ■  B( 0)  of  every  nilpotent  analytic 

function  B(x)  which  satisfies  B( 0)  *  B  : 

o 

1)  B  *  S  B  -  B  S  is  solvable  for  S 
o  o  o  o  o  o 

L 


2)  r  Bk^B  f~K  -  o 

^1  O  0  0 

b. 

3)  tr .(BqBq)  -  0  for  k  *  0,1,2, ... ,m-l  . 

Proof :  Without  interpreting  the  dot  as  a  derivative,  we  observe 
trivially  that  1)  implies  2)  and  3).  To  deduce  3)  implier  I), 
define  the  linear  operator  B  thus;  BX  =  XB q  -  B^X.  Any  11  teat 
functional  1  on  the  range  of  B  must  have  the  form 

LBX  *  tv.  (LBX)  for  some  matrix  L.  3ut  tr .(LBX)  *  tr.( 'LXB  -  LB 

o 

*  tr .(B  LX  -  LB  X)  m  -tr .((BL)X).  From  Fredholm’s  theorem  of  the 
o  o 

alternative  (cf.  Dunford-Schwarz  (1958)  p.609)  we  know  that  the 

•  •  , 

equation  3 S  ™  3  is  solvable  (perhaps  not  uniquely)  for  S 

O  O  £) 

only  if  LB  =>  0  for  every  L  which  satisfies  LB  **  0,  and 

have  just  seen  that  18  =  0  means  SL  *  0,  which  implies 

BqL  ■  LBq,  which  implies  that  L  is  a  polynomial  in  B q  since 

Bq  is  non-derogatory  (cr.  Gantmacher  (1959)  p.222).  So  every  L 

which  satisfies  13  *  0  has  the  form  LB  =  tv. (LB  )  = 

o  o 

■  tr .( (polynomial  in  B  )B  ) ,  and  tnis  must  vanish  because  of  3) 

o  o 

k 

and  the  fact  "  0  for  k  >_m.  Therefore  3)  implies  1).  Next 
let  us  deduce  2)  implies  3).  Since  B is  non-derogatory  and 

I  0  1 

I  0  1 

nilpotent  it  must  be  similar  to  J  «  I  ...  .If 

i  0  1 

\  0 

the  similarity  that  takes  3  to  J  takes  B  to  X  ■  (x.  J 

o  o  tj 

then,  by  2) ,  X  must  satisfy 

IJ  V-*  -  0  ; 

m+l-i 

iie*  J  .^-l.j-m+A  "  0  for  1  1  *  1  1  m  * 
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This  is  soon  recognized  as  equivalent  to 

n  -k 

£or  ; 

,  i.e.  tr ,'ry.l  ■«  to 

Reversing  the  similarity  yields  3). 

Finally  we  demonstrate  the  existence  of  an  analytic  nilpotent 

B(t)  that  intern  a*. as  B( 0)  =  B  and  B(Q)  *■  B  .  Solve  1)  for 

o  o 

•  •  , 

S  and  set  S  .)  £  1 + x5  and  B( x)  =  S(x)B  S(x)~X.  Now  B(x) 
o  o  o 

is  nilpotent  (and  non-derogatory) ,  since  it  is  similar  to  B  , 

c 

at  least  for  x  small  enough.  And  B(0J  “  S(0)B  -B  S(Q)  *  ‘i  . 

o  o  o 

Lacking  anything  comparable  to  lemma  III. 1.2  for  dercga'  -,ry 
nilpotent  matrices,  ve  should  like  to  avoid  them.  That  is  not 
difficult  to  do.  In  the  manifold  of  nilpotent  matrices  the  non¬ 
derogatory  ones  cor.*/.  :  tute  a  dense  open  set;  that  this  is  true 
can  be  inferred  from  the  Jordan  normal  form  in  a  way  that  will 
be  left  to  the  reader. 

III. 2;  The  condition  number  of  a  multiple  eigenvalue 

Let  £  be  an  r?-tuple  eigenvalue  of  an  n  x  n  matrix  Z  and 
let  P  be  C,  's  spectral  projector.  We  shall  estimate  the  condi¬ 
tion  number 


y  =  yfC.Zj  •  •  -|l2)  =  sup  |6c|/!!6Z|I2 

where  the  supremum  io  taken  over  all  infinitesimal  62  such  that 
Z+5Z  continues  tc  possess  an  m-tuple  eigenvalue  C+6^  near 


We  shall  show  that 
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y  <  m2/m  . 

Furthermore,  provided  the  restriction  of  Z  to  P 's  range  js  nt 
derogatory,  i.e.  provided  Z  has  only  one  eigenvector  belonging 
to  £  or,  equivalently,  provided  P(Z-£)m  m  0  +  P(Z-r,)m  \  we 
shall  show  that  y  can  be  computed  straightforwardly  though 
expensively  by  solving  a  linear  least-squares  problem; 

y  -  min  (P(l-  ~l  \k'Z-Ok)\lm  . 

A.  1 

K 


In  this  case,  we  shall  conclude, 


Although  the  upper  and  lower  bounds  for  y  are  far  apart  when 
m  >  1  and  llPl^  is  bJg,  each  bound  can  be  achieved  by  an  appro¬ 
priate  and  non-triviaJ  example. 

Here  is  how  those  claims  are  proved.  Recall  that,  provided 
no  eigenvalue  of  7  lies  on  the  closed  contour  F, 

p  ;  m 


is  the  spectral  projector  upon  Z  ’s  invariant  subspace  belonging 
to  the  eigenvalues  inside  f.  Suppose  there  are  m  such  eigen¬ 
values.  Then  their  average  value  is 

p  =  tr. (FZ) /m 

Since  no  eigenvalue  of  Z  lies  on  T,  we  find  that  both  F  and 
p  are  continuously  differentiable  functions  of  Z;  in  fact  an 
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Infinitesimal  perturbation  6Z  causes  P  and  y  to  chanpa  by 
(cf.  Kato  (1966)  pp.  76  and  79) 

6P  -  riv  A  (T-2)-16Z(T-2)_1dT  and 

<5y  -  tr ,(P6Z)/m  since  tr.(Z6P)  ■>  0 

We  are  interested  in  the  special  case  when  all  m  eigenvalues 
inside  F  are  coincident  at  £,  and  when  the  perturbation  <5Z 
is  so  constrained  that  all  m  perturbed  eigenvalues  inside  F  stay 
coincident  at  C+<5c.  In  this  case  y  *  r  and  <5y  ■  6^,  so  i;  's 
condition  number  y(Z,,Z,  |!  •  •  •  I!  satisfies 

Y  ■  sup  |6y  j/Jf'Zij  over  constrained  6Z 

*  ±  sup  kr.  (P6Z;|/H6Z!12  over  constrained  6Z 

<  ~  sun  |tr.  (  ?6Z)  |  /l  6Z II 2  over  all  6Z 
=  PPH2/> 

Thus  we  conclude  that  an  ill-conditioned  eigenvalue  must  have  a 
spectral  projector  of  large  norm.  After  we  show  to  what  extent  the 
converse  is  true  we  shall  show  how,  given  m  and  a  value  IIPI!^, 
to  construct  a  matrix  Z  with  v  - 

To  obtain  a  sharper  estimate  for  y  we  must  take  the  con¬ 
straints  upon  6Z  into  account,  and  we  shall  now  do  that  just  in 
the  non-derogatory  case  when  P(Z-£)m  *  0  i  By 

lemma  III. 1.1  the  equation  P(Z-£)m  "0  is  equivalent  to 

tr .  (P(Z-C,)k)  m  0  for  k  -  l,2,...,m 


which,  when  differentiated,  yields 
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k  tr.(P(Z-Q)k~1(6Z-6Q))  +•  tr.  ( (Z-Q)K6PJ  -  0 
The  last  term  vanishes  Because 

tr .  (  (Z-r)KSP)  -  JL  |  tr.r(Z-0;c(T-Z)“16Z(T-Z)“1;dT 

*  j  r 

-  tr.rl  <Z-t)fe(T-zf2dt  6ZJ  •  0  . 

4  1  l  P 

Furthermore  the  coefficient  of  6r,,  -fc  tr .  fP(Z-0*'\j ,  already 
vanishes  when  &  >  1.  Therefore  5Z  necessarily  satisfies 

tr.('(Z-r,r'1F6z;  -  0  for  *c  -  2,3 . m 


and  what  remains  to  be  shown  is  that  these  conditions  upon  6Z 
are  also  sufficient  ^o  ensure  that  ;+6c,  with  6;  ■»  tr .  iP(')Z,  >n , 
is  an  m-tuple  eigenvalue  of  Z+oZ. 

Let  us  choose  a  f -ordinate  system  in  which  Z  -  £  *  ?}  with 

**n  /■<  ^ 


non-singular  A  ana  an  rri  x  m  matrix  B  which  must  satisfy 
if  -  0  *  lf~-.  Now  P-(°°).  Let  &Z  -  (5„n  6Z12)  satisfy 


the  conditions  in  question; 


tr.((Z-OkP6Z)  «  tt.(l?hzz2)  *  0  for  k  =  l,2,...,m-j 


We  wish  to  infer  that  Z+6Z  has  an  m-tuple  eigenvalue  C+6u, 
and  shall  do  so  by  constructing  a  non-singular  matrix 


1-65 


1-65 


11 


■65  \ 

12  '• 


-652i  1  -  6sn 


(differing  infinitesimally  from  1)  for  which 
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U-&T*  i(Z-*-6Z- ;-6;)(i-65) 


/  AHA  0 


i 

l 


0  B 


When  this  similarity  relation  is  pre-multiplied  by  (1-65)  we 
find  that  65  and  6A  must  satisfy 


-  6£n-4-^su+6zu-6t  A SSu-SSn-B  -  SZ 


5652i  -652i-/}  "  6Z21 


S6522-  $S22-B  -  6Z22  -  6; 


These  equations  are  obviously  solvable  for  6/1,  65  (arbir ’■ary)  , 

6S21  =  "C  1  ^6Z11A  ~  and  5512  "  Iq-1  ^■J_16Z12-5J  ;  but 

the  solution  o522  of  the  last  equation  is  not  so  obvious  How¬ 

ever,  lemma  III. 1.2  provides  assurance  that  a  solution  65,,  does 

4.  — 

exist  provided  6;  -  tr.  -  tr .(P&ZJ/m,  in  which  case,  the 

conditions  3)  of  luu.au  III. 1.2  are  satisfied  with  B  -  B  and 

5o.6Z22-6c. 

Of  course,  the  foregoing  manipulations  with  infinitesimals 
6Sij  Can  be  ^-interpreted  in  terms  of  derivatives  along  the  lines 
of  lemma  III. 1.2  and  the  matrix  S( xj  constructed  there. 

Now  that  we  know  the  necessary  and  sufficient  constraints 
upon  6Z  etc. ,  namely 

tr.fP(Z-c)*;  -  0  and  tr.  (P(Z-Qk~l  {MHO  )  -  0  for  1,2 . m 

provided  P(Z~0  *  /  0,  we  return  to  the  computation  of 

Y  *  sup  ]6c|/[!6Z!I2  over  constrained  6Z 

The  computation  will,  be  carried  out  in  a  new  orthogonal  coordinate 
system  in  which 
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2-C 


I A  AR-RB 


\0  B 


]  and  P  m 

t  o 

'1 

1 

1° 

X/ 

“r‘d  A  is  non-siiyi.’ .. .  and  ^  •  n  4  n 

*'  ana  a  -  0  f  B  .  Once  again  we  ser 

<52  6Z10 

j  ’  B  {  H  12\ 

l(SZ21  &Z22  b'  "  Il°W  th£  constraints  take  the  fern, 

Sr’  "  tr.  (£Z22  -  6Z nRJ/m  and 

tr.((6Z22  -6?2iR)Bk)  -  o  for  k  »  1,2, . . . ,m-i  , 


Therefore 


Y  »=  sup  |6c;2/r<S2j|^  over  constrained  6Z 


^'23up  \tT.(SZ22-$ZnR)\2/lZUZ..}l)  over  ... 
-2 


‘  "  *  I!  _  i 

aj  2 
2 


"""  SUp  ltr-^“22-6Z21Wl2'(i6Z2i«2  +  IW22,2)  °ver  "• 

where  we  have  set  6^-0  and  62^-0  because  any  other 

diminish  the  quotient  we  are  trying  to  maximize.  The  desired 
supremum  may  now  be  located  by  standard  variational  techniques 
which  we  shall  merely  summarize  and  verify,  though  first  we  shall 
drop  the  5  in  front  of  6Z22  and  6Z£l  since  the  quotient  and 
the  constraints  are  homogeneous  functions. 

Let  C  =  U-^  kjET- )  (-/?  1)  with  the  coefficients 

so  chosen  that  |! c I! 2  =>  tr  frr*  t  .  .  . 

2  rr . [ ct  /  is  minimized.  The  A,  's  are  the 

solutions  of  the  normal  equations 

tx'(Ci  r]]Bk)  -  0  for  k  -  1,2 . m-1 

Vhlch  .re  Unear  In  U.)  and  „on-si„gular  coo  becau,e,  ai„ce 
■  0  I1  fl”  L  the  polynomial  J”’1  \  R  cannot  vanlsl,  unlM8 
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all  \.'s  vanish.  The  normal  equations  for  C  coincide  with  the 

J 

constraints  that  \Z_.  Z^,,)  must  satisfy  (recall  that  the  3  ’s 

'  l  i.C 


have  been  dropped),  so  C  is  a  permissible  choice  for 


Z22) 


and  differs  from  any  other  choice  by  a  matrix  J  -  (Z^  Z2l)~^ 

which  must  satisfy  the  same  constraints,  namely 


tr. (y(~*)8kJ  -  0  for  k  -  l,2,...,m-l 


We  are  auout  to  discover  that  only  when  (Z^  Z22)  a  'iCV'l~zer0 

scalar  multiple  of  C  can  the  following  quotient  achieve  its 
supremum: 


2  I  r  |  2  ,  (| 

m  ifiCi  /!l 


!  tr . ( (%2i  Z22' 


)("5hhi( 


421  Zn)  i  2 


m- 1 


-  !tr.f(c+j)(c*+f^)  y  \  .iP))\2nc+n* 

■*"  ^  kj  ^ 

-  \m\  +  tr JYC* )+Q'Z llc+n22 


-  -  (urn 2* ci 2-  \tT-(iC*)  !2)/ic+jti5 

1  ICI*  . 

with  equality  only  when  ?.  is  a  scalar  multiple  of  C.  Therefore 
we  have  proved  that,  in  the  non-derogatory  case, 


Y^,Z,l!-*-l2;  =*  |Cl2/w 

-  min  j  (1-  l)|2/m 

Xk 

“  min  |p(l-£J_1yz-0*)|2/m  . 

\ 

Our  next  task  is  to  secure  a  lower  bound  for  y.  Write 
r^n  1.  k 

[/  =  1-  \B  and  let  U's  singular  values  in  order  be 
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2  2  2 

0.  >  0.  >•••>  0  >  0.  Also  write  p  =  8PIL  ■  BP!!  -m.  We 

i  —  L  —  —  m  22 

*  * 

seek  a  lower  bound  in  terms  of  p  and  m  for  Ul  (-P  1)8 _/m. 


Evidently 


„  *  *  „ ?  *  *  2  2 

i;  v-a  Dij  -  w  r  r2  +  lu  r2 

>  P2p2  +  tr .(U*U)  ; 

—  Ttf 

it  it  2  * 

the  last  inequality  is  achieved  lust  when  UU  R  •  a  R  .  What  do 
1  -  m 

we  know  about  U 's  singular  values  0_.?  Since  ■  0,  det.u/  »  1 

2  2  2  .  * 

and  hence  0,a«***c  *  1  while  C,  +  a„ -t- *  •  •  +  a  3  tr  .(UU). 

12m  12  m 

Therefore 


*  *  222  2  22 

l£/  (-/?  nr>a  +  o>--+az  .  +  (i+p  )o 

2—1  /  m-1  m 

,JA2...2  2„  20l/m 


>  w(a1o2---om_1am(i+p  )) 


,,,  2,1/ji 

m  ( 1+p  ) 


with  equality  in  the  inequality  between  arithmetic  and  geometric 


means  just  when 


o  -a  /l+pT  . 

m-1  m 


Assembling  the  relevant  relations  above  yield* 


y  >  m  1/2(!iFQ2  +  l-m)1/2'’7, 


as  claimed.  The  final  tasks  are  to  demonstrate  that  the  bounds 
are  achievable.  Briefly,  to  achieve  the  upper  bound  y  WP^^rn 

it 

it  suffices  that  R  R  above  be  diagonal.  To  achieve  the  xower 
bound  it  suffices  that 


/  o  1 

I  0  1 


0  1  I 

0  /  mxm 
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K 

and  that  P  have  the  form  F  *  yx  where 

*  ,  1  -m  2-n  -2  -1  ,, 

X  \0  ,0  ,  .  •  •  , o  ,c  ,  1 ) 


*  2-B-2,  2  _N 

y  y  m  a  (0-1) 


for  some  arbitrary  a  >  1.  It  will  turn  out  that 

1/  =  1  -  {rS-l'-T"1  o~J  and  F*R  +  1  -  oZ(UU*)~1 .  The  details  are, 

once  again,  left  to  tne  reader. 

Since  y  is  huge  if  and  only  if  Ppfl ^  is  huge,  even  though 
they  may  still  be  c  vd  rs  of  magnitude  apart,  we  shall  henccfcvtii 

i 

dispense  with  y  .vad  use  only  ilP!^  c.  liPlI  as  our  measure  of 
ill-conditJ on. 

III. 3  What  happens  when  I'Pii  is  huge? 


Ill, 3  What  happens  when  I’Pii  is  huge? 

We  shall  consider  now  some  of  the  ugly  phenomena  associated 
with  spectral  projectors  of  huge  norm. 

Proposition  III. 3.1:  If  r,  is  an  n- tuple  eigenvalue  of  Z  and 
P  its  spectral  projector,  then  there  exists  a  perturbation  AZ 
such  that  u  is  or.  (m+1)- tuple  eigenvalue  of  Z+AZ  and 
SAZP  <_  l|Z-£li /!IP|! ,  so  !i A2 1|  is  small  if  liPi!  is  huge. 

Proof :  By  a  unitary  similarity  exhibit 


!a 

AR-RB 

'o 

-R] 

z-c  - 

\° 

h  I 

and  P  »  j 

1° 

X/ 

where  A  is  non-singular  and  5^-0.  Evidently  (Z-c)r 
so  we  find  that 


Am  A  mn 

rA  A  F>) 
(Q  0  ' 


i!?n  -  B(1  R)  i  <  U~mn(Am  AmR) !! 

=  <  u~hmtz-rjm  . 

How  close  is  A  to  its  nearest  singular  neighbour  A+LA?  We  know 
(see  part  I)  that  such  a  LA  can  be  found  with  UMlI  1/tA  XII , 
and  the  previous  inequality  shows  that  that  LA  satisfies 
'1^-11  £  HZ-jJ  Tnerefore  we  can  use  AZ  •  to 

achieve  what  has  been  claimed. 

This  proposition  slightly  sharpens  one  of  Wilkinson' c  n 972) 
when  m  *  1,  in  which  case  the  proposition  is  best-possible  with¬ 
out  more  information  about  Z  than  HZ-Cll  and  IIP!! .  Fven  <;o,  it 
can  be  somewhat  mislf  •.  VLng.  Consider  an  example  used  by 
G.E.  Forsythe: 


Here  we  may  assume  n  >  10,  and  r  small  and  positive,  say 
C  <  1/10.  Z  has  n  distinct  eigenvalues  equally  spaced  around 
a  circle  of  radius  A  all  with  the  same  condition  number 

Y  ■=  IProjeetor «  ■=  n~1/2^~n (l-t;2”)/(l-c2)  . 

Consequently,  the  proposition  says  that  Z  is  no  farther  from  a 
matrix  with  a  double  eigenvalue  then  roughly  jn  fact, 

Z  is  within  of  a  matrix  with  an  n- tuple  eigenvalue. 

When  m  >  1  proposition  III, 3.1  probably  far  over-estimates 
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the  distance  to  the  nearest  matrix  Z+AZ  with  an  (rrt-l)-tuple 
eigenvalue. 

We  now  turn  to  the  spectral  projectors  belonging  to  clusters 
of  eigenvalues  of  unspecified  multiplicities,  and  demonstrate  why 
projectors  of  large  norm  are  to  be  avoided. 

Proposition  III. 3. 2:  Let  r  be  a  closed  contour  in  the  complex 
plane  which  separates  Z  's  spectrum  into  two  parts;  m  eigenvalues 
(counting  multiplicities)  strictly  inside  F  and  the  rest  strictly 
outside.  And  let  P  be  the  spectral  projector  onto  Z  'a  invariant 
subspace  belonging  to  the  m  eigenvalues  inside  F.  Whenevei 
IPII  is  huge,  in  particular  whenever  IPII  >  /wH-1,  there  exist?  a 
small  perturbation  A/  satisfying 

1A7"  nz\\?  <  1.22/(1IP!I2  -  l)1/(2m) 
such  that  Z-AZ  It-  .  :  ieast  one  eigenvalue  on  the  boundary  F. 

Proof:  Once  again  vse  a  unitary  similarity  to  exhibit 


/a  ar-rb \ 

fo  -r\ 

1  and  P  * 

1°  s  I 

[0  1  j 

where  B  is  an  m * m  matrix  whose  spectrum  lies  inside  V  and 
A  ’s  spectrum  lies  outside.  Furthermore,  we  may  exploit  Autonne's 
theorem  to  exhibit  R  as  an  ( n-m )  x  m  diagonal  matrix  with  its 
singular  values  >  *  *  *  ^  >  0  on  its  main  diagonal. 

(It  is  convenient  he 'e  to  assume  n-m  >_  m ;  otherwise  swap  the 
roles  of  A  and  ZJ.)  Note  that  ll/?l!  *  and  !!P!I  *  /l  +  p'f  . 


ataMiiiteaMtttitaaati 


For  any  k  iu  1  £  k  <  m  we  may  partition 


(A  0^  with  square  A  «  diagfo, ,p2> . . . ,p^j 
0  hj  and  M  *>  diagfp^.,  *  • .  •)  or  null* 


and  conformally  partition 


lx 

X,  J\ 

i  j 

U  A-I\B ,  yl  Ji-hB  „\ 

1  11 

1 

lx 

j  «  X  =  AR  -  RB  -  I 

11  11  12  12 

\xn. 

Xnn  i 

f 

i  d  A-Mfl„  /l  „M-MB„ 

\  21 

22  ■ 

\  21  21  22  22  / 

a  f/SA  0  ^ 

We  shall  examine  a  i  L . t Inguished  AZ  =  [  ^gj  where 

f-A‘^  /2  B 

_  “ !  are  9o  chosen  '.hat 


_  fVhz  o, 

CsA  ~  .  '  nrc  A.-?  =  | 

1  A,,  0>  l 


il2l 

0  j 


lL 


1 

'W1*"  *1*1 

A 

A  -  hA  »  ! 

and  3  -  AB  “ 

1 

l  0  "22 1 

\ 

0 "  ^11 A+5J1)/2  ° 


S2i 


22 


have  in  common  the  /.  common  eigenvalues  of 

04.,  +  AB.,A_1)/2  -  A  (A-1/l . ,  A  +  3. , )  A-"1/  2 

1  -.1  lx  11 

A 

Consequently,  using  AZ  =  tAZ  with  0  <  T  <  1  we  shall  find  that 
the  eigenvalues  of  Z-AZ  move  continuously,  as  t  increases  from 
0  to  1,  until  k  eigenvalues  that  started  inside  T  coalesce 
with  k  that  started  outside  F,  For  some  T  between  0  and  1 

A 

one  of  those  eigenvalues  must  cross  F,  and  then  1  AZ II  ^  ^ 

<  3  AZ  S  2 . 

A 

Thus  j  all  that  remains  to  be  shown  is  that  HAZlI  satisfies 
the  inequality  claimed  in  the  proposition  for  at  least  one  k  >  1. 
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fm\  «  kmii2  +  s ash2 


‘  ili'nA'1!l  ♦  ] 'V*a>*’,lj +  !A'lwi2M-Jfi2>Ii+  4lA_ljriili 
-  °k  t(JC3.1tT/z 2  +  P/t+l!B2l!2)  +  ^lJ'12il2+  pfc+l 12^2^  ' 
A  <A1+  0 •♦x'<l«nl2+  I*nl2+  l*al!+ K2I2 ‘  I'uti’ 


-2  2 

Let  us  now  choose  k  to  minimize  the  factor  p,  (1+p-  , )  Sup- 

K  KTX 


pose  0  is  that  minimum  value;  i.e. 


pk  ^1+piUl'  -  6  for  k  *  1,2 . m  (pm+i 


o2  ,  ^ 
■  m 


\i+p2)  <  9"1+0  2 
m  — 


p2  <•  Q"x<'^o2)  <  e"1-^  2+  •••  +  0  m 


evidently  9  is  no  _ig&er  than  the  positive  root  0  of 


IPir  -  i+p2  =  l  +  0-1  +  0“2  +  --.  +Q'm 


2  ■*  2  — ^ 

When  p  >  m  we  must  have  0  x  >  1  and  hence  p.  <  mQ  whence 

X  1 

0  <  Tr}’^' p ‘n  <  olPl!*"~l)  where  c~^ ~  =  1.445’ •*  .  The 

claimed  result  follow'. 


This  proposition  seems  to  overestimate  !!AZl!  grossly.  Indeed, 
if  P  has  k  large  singular  values  and  the  rest  small,  say 
A+p£/ A+p?+^  »  1.  ’■hen  the  proof  above  yields 
IIAZII  <_  p^Vl+pj^lZI ,  which  is  far  smaller  than  claimed  in  the 
proposition.  Another  example  of  overestimation  arises  when  a 
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similarity  (perhaps  not  unitary)  of  modest  condition  number  ( ?ee 
below)  succeeds  in  diagonalizing  A  and  B  without  erasxug  the 
block  AR-llB.  It  is  possible  to  show  then  that  liAZII  need  not 
much  exceed  IIZH^/HPil  -.’hen  IIP!!  is  large;  this  claim  will  not 
be  proved  here. 

Next  we  shall  consider  the  condition  number  <(Q )  ~  Pi"' 5$ 
of  similarity  trar.r.f ornations  that  reduce  Z  to  the  blo''k  diagonal 
form 


There  are  many  simi ' iritles  which  reduce  Z  to  this  form,  ard  we 
shall  he  par ticulcrl”  Interested  in  the  ones  whose  condition  num¬ 
bers  are  roughly  m  ■<  n  •!."*■»  I .  Experience  teaches  us  that  if  the  mini¬ 
mal  condition  number  <s  huge  then  the  reduction  of  Z  will  be 
hypersensitive  to  rounding  errors  and  other  perturbations  and 
uncertainties;  see  « '.Ikinson  (1965)  p.87. 

Proposition  I I I. 3. 3:  Let  r,  Z,  m  and  P  be  as  in  the  previous 
proposition  III. 3. 2.  When  !|Pll  is  huge  every  similarity  Q 
which  reduces  Z  to  block  diagonal  form  with  one  block  for  the  n 
eigenvalues  inside  r  and  the  other  block  for  those  outside,  must 
be  ill-conditioned;  r''?,1  SlPil .  Conversely,  if  every  similarity 
is  ill-conditioned  -hen  ||Pl  must  be  big  because  for  some  such 
similarities  ||P||  >  v(Q)/ 4. 

Proof:  Once  again  xnr  a  unitary  similarity  (which  does  not  aggra¬ 
vate  the  condition  mmbers)  to  exhibit  Z  in  the  block  triangular 
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used  In  proposition  III. 3. 2.  Any  eligible  similarity  Q 
oiuol  exhibit  two  blocks,  one  similar  to  A  and  the  ociiei  ,u  B. 
Consequently,  every  such  Q  must  have  the  form 


(s’1  AS 

0  ' 

_ 

Q  Zy  ** 

-1 

> 

\  ° 

T  BTj 

whence  Q  = 

<S  - RT t 

(o  a  j 

u.ui  Q _1 

=  5_1 
0 

5~1Pl 

-1  i  * 

T  1 

J 

Now  r  211  2  T 

and  *Q\\  >  \ 

/  ii2,-1n , 

and 

HQ'1  II  > 

•  Hr"1!!  ana 

lii?-1  II  >  HS_1(1  Ay.:  >/  Therefore 

Uk(Q)  -  4|0!’  '0  1,1  >  (1151  +  HPH/BT-1!!)  (HT-1!  +  lPl/1^1 } 

4IPI  ,  as  claimed. 

On  the  other  hand,  is,  v<  choose  for  5  and  T  any  matrices  which 
*  2  *2 

satisfy  5  5  «■  c  «:<1  i  T  «  T  for  constants  0  and  T  that 

satisfy  o/t  *=  IlFli  ,  find  that 


t 

{ 


k (Q)  -  ii $i  * i;  v  * r  <  (ii5ii  +  iipii  •  urn)  (ii2,”1ii  +  H5-1n  * iipii) 

*■  (a  +  TllPl|)(T-1+o"1HP||) 

»•  4 IIPII  ,  as  claimed. 

1 1 1. 4:  The  nearest  nil  potent  matrix 

Suppose  we  have  identified  every  cluster  of  Z 's  eigenvalues 
to  which  belongs  a  spectral  projector  of  moderate  norm,  and  no 
such  cluster  may  be  broken  up  without  introducing  huge  spectral 
projectors.  We  con' i  perform  a  unitary  similarity  which  exhibits 
Z  in  block-upper-ut lingular  form  with  one  diagonal  block  for  each 
cluster.  What  should  op  done  next? 
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In  a  sense,  tucL  bxuv.k  resists  further  reduction  as  if  it 
were  an  approximation  to  a  truly  irreducible  block,  namely  a  brock 
vlth  only  one  multiple  ^l^envalue.  The  purpose  of  what  follows 
is  to  discuss  how  to  locate  that  irreducible  block  in  the  hope  that 
may  replace  ta.-.  iil-t  shaved  cluster  of  eigenvalues  oy  a  wtxl 
behaved  multiple  cJgeuvciut  without  appreciably  changing  t.i g Iven 
matrix. 

Probiem  111,4.1 ;  Given  an  ryxm  block  B,  find  the  nearest- 
matrix  Q  +  C  with  onx>  one  eigenvalue  8;  C  must  be  nilpotent. 
by  ''nearest"  we  mefu  Tinimize  US- 8 -Cl  . 

It  is  not  hard  t  find  the  best  value  for  8;  write 
6  -  tr.CSJ/m  +  5  ant  observe  IIB-B-Cl!^  -  ll£  -  tr.  (B)/m-  6 II  ^  +  \t,  j  2 
since  tr .(C)  n  0.  Therefore  the  best  value  for  8  is 

8  "  tr  .(B) /rrj  (cf.  U  In  111.2)  ; 

and  from  the  same  tv-  :•  ; vat Jon  we  deduce  that  the  nilpotent  matrix 
C  nearest  to  B-.  •  ■  Independent  of  6.  That  at  least  one  such 

nearest  nilpotent  C  r  -i.sts  follows  from  the  fact  that  we  need 
only  search  for  the  matrix  in  the  compact  set  of  nilpotents  C 
which  also  satisfy 


"5- 6- Cl2  <  |B- 8 -0|l2  , 

since  there  is  no  need  .3  look  at  anything  farther  away  than  the 
nilpotent  0. 

Let  us  imagine  ;v.?t  the  best  C  has  been  found,  and  choose 
a  new  set  of  orthogonal  coordinates  to  exhibit  C  in  upper 


■fcdeaufi 


r  ni-Tif  iriiili 


laifi m A^ifajir  i^ii  ■  i  n 
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tiiangular  form.  Since  o  is  nilpotent  it  is  strictly  upper 
triangular.  Since  0  is  closest  to  6-B,  B-Q-C  must  di  lew*-’- 
’.i.  L angular  in  ths  r.oot  ’  -.ate  system,  and  that  lower  trisngle  must 
have  the  minimum  norm  of  all  lower  triangles  of  matrices  unitedly 
similar  to  B  -  p  ‘iince  tne  norm  of  all  of  B  -  8  is  urenangvo 
t>y  unitary  similarity ,  vt  nave  the  following  result: 

Proposition  III.4.Z:  Given  an  m  *m  matrix  B,  the  nearest 

matrix  3  +  £  with  only  one  eigenvalue  B  can  be  constructed  as 

* 

follows.  Of  all  matrices  U  BU  unitarily  similar  to  B,  choose 

one  whose  super-d’ ..gcnal  elements  have  the  largest  sum  of  squared 

h 

magnitudes;  call  ii  X  «  V  BU.  Annihilate  all  the  sub-diagonal 
elements  of  t  t>  g*  t  F.  Its  diagonal  elements  will  all  be  the 
same,  namely  B  O’his  is  not  obvious  —  see  below).  Then 

e+c  -  ufu*. 

To  prove  that  all  tV.  diagonal  elements  of  E  are  the  same  we 
need  only  consider  its  2x2  principal  submatrices  with  adjacent 
rows  and  columns,  iiacn  such  submatrix  must  be  such  that  no  2*  2 
unitary  similarity  -an  i.icrease  its  super-diagonal  element.  A 
modest  calculation  snow  a  that  this  implies  its  two  diagonal  ele¬ 
ments  are  equal.  J  am  indebted  to  Alan  J.  Hoffman  for  suggesting 
this  simple  approach  to  what  used  to  be  a  much  more  complicated 
proof.  That  proof,  which  used  variational  methods,  also  showed 
that  (S-S-C)  must  be  a  polynomial  in  C,  and  that  if  *  0 
then  k  >_  (m+1 ) / 2 ,  ut .  these  facts  seem  not  to  help  the  search 


for  C. 


Proposition  III. '.2  suggests  that  C  might  be  constructed 
via  a  sequence  of  2*2  Jacobi  rotations  each  designed  ro  enhance 
cue  magnitudes  o£  super-diagonal  elements.  Such  a  scheme  works 
immediately  when  m  -  2,  may  work  well  when  m  -  3,  and  seems  to 

■*  intolerably  slow  ‘or  m  >  h.  There  is  ample  scope  for  ■. urther 
research. 
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